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Abstract 

We propose a conjugate gradient type optimization technique for the computa- 
tion of the Karcher mean on the set of complex linear subspaces of fixed dimen- 
sion, modeled by the so-called Grassmannian. The identification of the Grass- 
mannian with Hermitian projection matrices allows an accessible introduction 
of the geometric concepts required for an intrinsic conjugate gradient method. 
In particular, proper definitions of geodesies, parallel transport, and the Rie- 
mannian gradient of the Karcher mean function are presented. We provide an 
efficient step-size selection for the special case of one dimensional complex sub- 
spaces and illustrate how the method can be employed for blind identification 
via numerical experiments. 

Keywords: conjugate gradient algorithm, Grassmannian, complex projective 
space, complex linear subspaces, Karcher mean. 



1. Introduction 

In a wide range of signal processing applications and methods, subspaces 
of a fixed dimension play an important role. Signal and noise subspaces of 
covariance matrices are well studied objects in classical applications, such as 
subspace tracking [T] or direction of arrival estimation [2]. More recently, a 
significant amount of work is focussed on applying subspace based methods 
to image and video analysis [3], as well as to matrix completion problems 0]. 
One fundamental challenge amongst these works is the study of the statistical 
properties of distributions of subspaces. Specifically, in the present work, we 
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are interested in computing the mean of a set of subspaces of equal dimension 
via averaging. 

The averaging process, considered in this paper, employs the intrinsic ge- 
ometric structure of the underlying set and is also known as the computation 
of the Karcher mean (in differential geometry, [5]), Frechet mean or barycentre 
(statistics), geometric mean (linear algebra and matrix analysis), or center of 
mass (physics). General concepts of a geometric mean have been extensively 
studied from both theoretical and practical points of view. To mention just a 
few, they include probability theory and shape spaces [6] [7], imaging [8], linear 
algebra and matrix analysis [S], interpolation |10j . and convex and differential 
geometry [TTHT2] . 

An appropriate mathematical framework is given by the so-called Grassman- 
nian, which assigns a differentiable manifold structure to the set of subspaces of 
equal dimension. Usually, this is achieved by identification with a matrix quo- 
tient spacej^] In this work, we do not follow such an approach. By following [T3] 
instead, we identify the set of subspaces of equal dimension with a set of matri- 
ces. More precisely, we consider the set of Hermitian projectors of fixed rank, 
which inherits its differentiable structure from the surrounding vector space of 
Hermitian matrices. In contrast to |14j . we consider the complex case here. The 
identification of the complex Grassmannian with Hermitian projection matrices 
allows an accessible introduction of the geometric concepts such as geodesies, 
parallel transport, and the Riemannian gradient of the Karcher mean function. 

In general, computing the Karcher mean on a smooth manifold involves a 
process of optimization, which by its own is of both theoretical and practical 
interest. Various numerical methods have been developed on the Grassmannian, 
such as a direct method [IS], gradient descent algorithms [IB], Newton's method 
[13], and conjugate gradient methods [TTlfTB] . 

In this work, we focus on the development of conjugate gradient methods. 
These methods have been proven to be efficient in many applications due to their 
trade-off between computational complexity and excellent convergence proper- 
ties. In particular, we propose an efficient step-size selection for the interesting 
case where the Grassmannian is equal to the complex projective space. More- 
over, we outline how the developed method can be employed for blind identifi- 
cation. 

The paper is organized as follows. Section [2] recalls some basic concepts in 
differential geometry, which make the present work intuitive and self-contained. 
An abstract framework of conjugate gradient methods on smooth manifolds is 
given in Section[3] In Section[4] the geometry of the Grassmannian is presented, 
followed by a detailed analysis of the of the Karcher mean function in Section [5] 
A geometric CG algorithm is given in Section [6] for the computation of the 
Karcher mean on the Grassmannian in general, together with a particularly 



4 The set of m-dimensional subspaces of C n is identified with C" xm /GL(m), cf.[[3], C" xm 
is the set of full rank (nxm)-matrices, and GL(m) are the complex invertible (mxm)-matrices. 
The equivalence relation is defined by X X = gY for some g S GL{m). 
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efficient step-size selection for the special case of the complex projective space. 
In Section [7J we outline how the proposed approach of averaging subspaces 
is evidenced to be useful in blind identification and a conclusion is drawn in 
Section 

2. Differential geometric concepts 

In this section, we shortly recall and explain the differential geometric con- 
cepts that are needed for this work. We refer to [50] for a detailed insight 
into differential and Riemannian geometry and for the formal definitions of the 
mathematical objects, and to [13] for an introduction of the topic with a focus 
on matrix manifolds. 

Strictly speaking, a manifold M is a topological space that can locally be 
continuously mapped to some linear space, where this map has a continuous 
inverse. These maps are called charts, and since charts are invertible, we can 
consider the change of two charts around any point in M as a local map from the 
linear space into itself. M is a differentiable or smooth manifold, if these maps 
are smooth for all points in M . Many data sets considered in signal processing 
are subsets of such a manifold. Important examples are matrix groups, the set 
of subspaces of fixed dimension, the set of matrices with orthonormal columns 
(so-called Stiefel manifold), the set of positive definite matrices, etc. 

To every point x in the smooth manifold M one can assign a tangent space, 
consisting of all velocities of smooth curves in M that pass x. Formally, we 
define 

T X M := {d x (0) | a(t) C M, a x (0) = x}. (1) 

Intuitively, T X M contains all possible directions in which one can tangentially 
pass through x. The elements of T X M are called tangent vectors at x. 

A Riemannian manifold M is a smooth manifold with a scalar product 
g x (-, •) assigned to each tangent space T X M that varies smoothly with x, the so 
called Riemannian metric. We drop the subscript x if it is clear from the context 
which tangent space g refers to. The corresponding norm will be denoted by || -|| g . 
The Riemannian metric allows to measure the distance on the manifold. As a 
natural extension of a straight line in the Euclidean space, a geodesic is defined 
to be a smooth curve in M that connects two sufficiently close points with 
shortest length. The length of a smooth curve a: (a, b) — > M on a Riemannian 
manifold is defined as 

L(a) = J" ^g a{t) WhW)) d*. (2) 

In Euclidean space, two velocities at different locations are both vectors in 
this space. This allows to form linear combinations and scalar products of 
these vectors. In the manifold setting, however, this is not possible, since these 
velocities are elements in different (tangent) spaces. We hence need a way to 
identify tangent vectors at x £ M with tangent vectors at y £ M if x ^ y. To 
that end, we assume that there is a unique geodesic in M that connects x and y, 
say j(t), with 7(0) = x and 7(7") = y, being possible if x, y are not too far apart. 
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The parallel transport along j(t) admits one way of identifying T X M with T y M. 
A rigorous definition is beyond the scope of this work, but loosely speaking, the 
transportation is done in such a way that during the transportation process, 
there is no intrinsic rotation of the transported vector. In particular, this leaves 
the scalar product between the transported vector and the velocity of the curve 
invariant. 

Certainly, such an identification of tangent vectors depends on the geodesic. 
Consider for example a sphere with two different geodesies connecting the south 
with the north pole (i.e. two meridians) that leave the south pole by an angle of 
7r/2. Parallel transporting the same vector along both meridians from the south 
pole to the north will result in two antiparallel vectors at the north pole. Note 
that the identification of different tangent spaces via parallel transport along a 
geodesic is just one particular instance of a more general concept termed vector 
transport in [13j . 

In order to minimize a real valued function on M, we have to extend the 
notion of a gradient to the Riemannian manifold setting. To that end, recall 
that if / : M. n — > K is smooth in x, there is a unique vector G(x) such that 



±f(x + tH)\ t=0 = G(x) T H =: (G{x),H) Eucm for all H e K", (3) 



where (-) T denotes transpose. Typically, we write V/(x) := G{x) and call it 
the gradient of / at x. This coordinate free definition of a gradient can be 
straightforwardly adapted to the manifold case. Let 



be smooth in x G M. There is a unique tangent vector G(x) € T X M such that 



for all geodesies 7 with 7(0) = x and 7(0) = H. We denote the Riemannian 
gradient as grad/ (x) := G(x). Note, that the Riemannian gradient is a tangent 
vector in the respective tangent space that depends on the chosen Riemannian 
metric. It is unique due to the Riesz representation theorem. The following 
special case is of particular interest. Let M be a submanifold of some Euclidean 
space E with Riemannian metric induced by the surrounding space, i.e. the 
Riemannian metric is obtained by restricting the scalar product from E to the 
tangent spaces. In this setting, the normal subspace is the orthogonal comple- 
ment of the tangent space. Assume furthermore, that the function / that is 
to be minimized in Q is in fact the restriction of a function / that is globally 
defined on the entire surrounding Euclidean space. If Tl x denotes the orthogonal 
projection from E onto the tangent space T X M, then the Riemannian gradient 
is just the projection of the gradient of / in E. In formulas, this reads as 



/ : M -> K 



(4) 



Df(x)H := ^(/o 7 )(t)| t=0 = g x (G(x),j(0)) 



(5) 



grad/(z)=n x V/(x). 



(6) 
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Figure 1: Illustration of the geometric conjugate gradient method. 



3. A conjugate gradient method on manifolds 

In this section, we recall one possibility of how to transfer the concept of 
conjugate gradient (CG) methods to the manifold setting. We refer to [IB] 
for a more general approach that uses retractions on manifolds. Ultimately, 
the latter approach might lead to a whole set of general methods to minimize 
Q. The CG method is initialized by some xo & M and the descent direction 
Hq := — grad /(xg) is given by the Riemannian gradient. Subsequently, sweeps 
are iterated that consist of two steps, a line search in a given direction (i.e. along 
a geodesic in that direction) followed by an update of the search direction. We 
illustrate the CG method on manifolds in Figure [2j Several different possibilities 
for these steps lead to different CG methods. Assume now that Xi, Hi, and 
Gt := grad/(a; l ) are given. 

3.1. Line search 

Given a geodesic ji with Ji(0) — Xi and ji(0) = Hi, the line search aims 
to find <Zj £ R that minimizes / o 7: t — >• E. We propose two approximations. 
The first is based on the assumption that / o 7 has its minimum near 0, which 
under certain mild conditions follows from the fact that Xi is already near the 
optimum. The step-size is chosen via a one dimensional Newton step, cf. |17j . 
i.e. 

a Nowton ._ dt (/°7)(Q|t=0 ^ 

|g^r(/°7)(t)U=o|' 

The absolute value in the denominator is chosen for the following reason. While 
being an unaltered one-dimensional Newton step in a neighborhood of a mini- 
mum the step size is the negative of a regular Newton step if 
7)W| f - < and thus yields non-attractiveness for critical points that 
are no minima, cf. |21) . 

This approach, however, uses second order information of the cost function 
and is often computationally too expensive. An alternative approach is the 
Riemmanian adaption of the backtracking line search, described in Algorithm [l] 
below. The new iterate is then given by 

x l+1 =j(a l ), (8) 
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where is either obtained by backtracking or by Eq. Q. 



Algorithm 1: Backtracking line search on manifolds 

Step 1: Choose a > 0, c,p£ (0, 1) and set a <— a; 
Step 2: repeat until (/ o 7 )(a) < /(x,) + c a g x (G 4 , Hj); 
a ,o a; 
end repeat; 
Step 3: Choose step-size a ^ acktrack := a: 



3.2. Search direction update 

In order to compute the new search direction H i+ i g T Xi+1 M, we need to 
transport Hi and Gi, which are tangent to Xi, to the tangent space T X . +1 M. 
This is done via parallel transport along the geodesic 7, which we denote by 

t: T Xi M — > T Xi+1 M. (9) 

The updated search direction is now chosen according to a Riemannian adap- 
tion of the Hestenes-Stiefel formula, or any other CG formula known from the 
Euclidean case, cf. [22]. Specifically, we have 

H i+1 = -G , J+1 +r l ri/. i , (10) 

where the most common formulas for read in the manifold setting as 

rf S = ^t^gj (Hestenes-Stiefel) 

r p R = g (G i+ i,G|+i-TGi) (Polak-Ribiere) 

r f R = (Fletcher-Reeves) (11) 

^ Y = W&^GJ) (D^-Yuan) 

* _ g(Gt +1 G i+1 -TGi) 
'i g{Hi,Gi) 

Albeit the nice performance in applications, convergence analysis of CG methods 
on smooth manifolds is still an open problem. To the best of the authors' 
knowledge, the only partial convergence result is provided in |23j . 



4. Geometry of the Grassmannian 

The complex Grassmannian Gr m „ is defined as the set of complex Tri- 
dimensional C-linear subspaces of C n . It provides a natural generalization of 
the familiar complex projective spaces. We denote the unitary group by 

\J n :={X eC nxn \X H X = I n }, (12) 
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where (-) H denotes complex conjugate transpose, and /„ is the (n x n)-identity 
matrix. For computational purposes it makes sense to identify the Grass- 
mannian Gr„ l!n with a set of self-adjoint Hcrmitian projection operators as 

Gw := {P e C" x " | P H = P,P 2 = P,tvP = to}, (13) 

i.e. the smooth manifold of rank to Hermitian projection operators of C™. 
Here, tr(-) is the trace of a matrix. In the sequel we describe the Riemannian 
geometry directly for the submanifold Gr m! „ of C" x ™. As we will see, this 
approach has advantages that simplify both the analysis and the design of CG- 
based algorithms. We begin by recalling facts about the complex Grassmannian 
PH2S]. Let 

u„:={0 e C nxn \fl H = -n} and Hcrm„:=iu„ (14) 

denote the real n 2 -dimensional vector spaces of skew-Hermitian and Hermitian 
matrices, respectively. Here we follow the terminology in group theory, with u n 
being the Lie algebra for the unitary Lie group U„. In particular, e 11 ™ = U n , 
where is the matrix exponential function. 

Theorem 1. The Grassmannian Gr m n is a real, smooth, and compact sub- 
manifold of Herm„ of real dimension 2m(n — to) + 1. Moreover, the tangent 
space at an element P £ Gr mj „ is given as 

T P Gr m .„ = {Pn - OP | Q e u„}. (15) 

It is useful for further analysis to define the linear operator 

adp : C" xn -> C" x ™, adp(X) :— [P, X] := PX — XP. (16) 



Lemma 1. for the real case) For any P € Gr m .„ the minimal polynomial 

o/adp is equal to s 3 — s. Thus adp = adp, i.e., 

adp H = [P, [P, H]] =H VHET P Gr m ,„ . (17) 

In the sequel, we will always endow Herm„ with the Frobenius inner product, 
defined by 

(X,Y) :=tr(XY). (18) 

The Euclidean Riemannian metric gp on Gr m „ induced by the embedding space 
Herm„ is defined by the restriction of ( 18 ) to the tangent spaces, i.e. 

g P (H u H 2 ) = tr{H x H 2 ) MP € Gr m ,„ and VH U H 2 € T P Gr ro ,„ . (19) 

Lemma 2. (\ 1J$ for the real case) Let P G Gr mi „ be arbitrary. The normal 
subspace at P in Herm„ is given by Np Gr m ,„ = {X — adp X \ X £ Herm„}. 
The linear map 

Up : Hcrm„ -> Hcrm„, X ^ adp X = [P, [P, X}} (20) 



is the self-adjoint Hermitian projection operator onto Tp Gr mi „ with kernel 
Np Gr m „. 
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In general, a geodesic is a minimizer of the variational problem ([2]), i.e. it 
is the solution of the corresponding Euler-Lagrange equation, the latter being a 
second order ordinary differential equation. The following result characterizes 
the geodesies on Gv m n . 

Theorem 2. ( \ IJfl for the real case) The geodesies of Gr mn are exactly the 
solutions of the second order differential equation P+[P, [P, P]] = 0. The unique 
geodesic P(t) with initial conditions P(0) = Pq £ Gr TOi „, -P(O) = H g Tp Gr mjn 
is given by 

P ff (t)=e*^ p °]p e- t[ff " Po1 . (21) 

Definition 1. We define the Riemannian exponential map as 

exp Po : T Po Gr ro , n -> Gr m ,„, H h+ P ff (l). (22) 

As outlined above, we need the concept of parallel transport along geodesies 
to give vector addition a well defined meaning. 

Lemma 3 (jUj for the real case). For P £ Gr m „ and Gq £ Tp Gr m „ the 
parallel transport of Gq along the geodesic Ph (t) is given by 

G H (t)=e t ^G e- t ^. (23) 

Note, that the complex case considered here follows by a straightforward 
adaption of the proof of the real case in [26] . 



5. Karcher mean 



5.1. The distance between two complex subspaces 

In the first step we investigate the Riemannian distance of two complex 
subspaces. For convenience, we denote the standard projector by 



X:=[VSJ- (24) 

Let P £ Gr„ l n , and assume for the moment that P is sufficiently close to X, i.e. 
that there is a unique geodesic emanating from I to P. Together with Eq. (21 1 
this implies the existence of a unique Z = Z(P) £ Bq C c(™-" l ) xrl where Bq is 
a sufficiently small open ball around the zero matrix 0, such that 



P 



r o -z" 

eLz o 



le 



o -z" 

. Z 



(25) 



Hence, Z can be considered as a function of P, implicitly defined by ( 25 ) . 
Lemma 4 (cf. Fig. ??). With 



o -Z"(P) 

z(p) o 



Z{P) := 

the geodesic distance from T to P is given by dist(X, P) 



(26) 



\{Z{P)M\ 
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Figure 2: Illustration of the geometric conjugate gradient method. 



Proof. Let j(t) = e tz ^Ie t2 ( p ) be the geodesic (21) emanating from X in 
direction ad z(P) (I) with 7 (1) = P. With j(t) = e* 1 ^ [Z(P),2~] e"*^ and 
by Eq. ^ we get 

Jo ll-tWII dt = Jo 1 ||[Z(P),I]|| dt = ||[Z(P),2]|| = Vtr[2(F),I]2 (27) 

and the result follows. □ 

Let U C Gr m n be a neighborhood around X and B C Tx Gr m „ such that 
exp x : P — >• E/ is one-one and onto. Consider the function 

/ : U -> R, P i-> dist 2 (X, P) . (28) 

To calculate the derivative we use an equivalent expression of /, namely 

j 2 



f(P) 



tr ( X ad 



Z(P) 



(2) 



(28 



Let PT € Tp Gr m; „ be an arbitrary tangent vector. For the directional derivative 
we will use the abbreviation Z' := DZ(P)H, cf. Eq. §5^. Therefore 

D/(P)P = -2tr(Tad z (ad Z /(X))) . (29) 

For computing the Riemannian gradient, we need an expression for adz 1 - To 
that end, note that Eq. ( 25 ) is equivalent to 

P = c adz < p )(I) 



(30) 



and thus differentiating ( 30 ) with respect to P in direction H yields 
D(P)P = H = e ad2 



id-( 



ad ad 



-{ad z ,))(I) 



(31) 



Here, the operators e z and - 



have to be understood via their series 



expansion and acting on the right as usual. By parallel transporting |~ £. K " | 6 



. k o 



Tx Gr m .„ with K € C^ n ™ l ) xm along the unique geodesic connecting I and P it 
is easily seen that H G Tp Gr mj „ has the representation 



H = c 



ad :- 



[OJf H l 
Lif J 



(32) 
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Using 



K := 



— K 
K 



(33) 



and ( 32 1 a lengthy but straightforward computation including the decomposition 
of the function x t— > 1 ~ c = + 1 ~ cosh g m to even and odd parts shows 



that (31) is equivalent to 



- 1 • i . 



-(ad*,) (2) 



(34) 



sinh ad a 



By assumption on P being close enough to Z, the selfadjoint operator , 
is invertible. Exploiting now the representation property of the ad-operator, 
i.e. [adx,ady] = adrx,y], as well as linearity and anti-selfadjointness, i.e. 
tr(Aads(C)) = — tr(C ads(j4)), we can conclude that (34) is equivalent to 



Z' = ( 



sinh ad 2 



(35) 



In summary, 



i ad 2 



Df(P)H = -2tr(Tad z (ad^ 

= 2trff a dx^(adx((^^)"(/C)))) 




(36) 



2tr(Z/C). 



t sinh ad 2 



The last equality in ( 36 ) is true by the self-adjointness of the operator ( £ adg — t 

- + 0{x i ) being an even function. In other words, 



and x H> 



sinh x 



1 



( S1 " a ^ dz ) 1 can be considered to act as the identity operator to the left onto Z 
under the trace. Hence, critical points are characterized by 

D/(P) = 5Rtr(Z H X) =0VK e C ( "" m)xm < 



Z = 0. 



(37) 



Together with Eq. (30) this yields that the unique critical point of / is given 
by P = 2, as one would expect]^] Moreover, from (36) we can compute the 
Riemannian gradient of the function / by 



Bf(P)H = -2tr(Z/C) = 2tr([£,2][/C,2]) 



2tr(e 2 [2,2] e" 



-z n z 



[/C,2]e- 



tr(2[Z,P]7J). 



(38) 



Since 2[2, P] £ Tp Gr m n and since the trace is the Riemannian metric we can 
conclude, cf. ([5]), that 

grad/(.P) =2[Z,P}. (39) 



5 / is defined in a neighborhood of I ensuring bijectivity of the Riemannian exponential. 
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Up to here all our computations were done sufficiently close to the standard 
projector X. As the Grassmannian is a homogeneous space, meaning that every 
point P £ Gr m „ can be transformed to any other point on Q 6 Gr m n by 
a suitable unitary matrix transformation P = QQO H , G U n , we can now 
transfer all our computations to an arbitrary element of Gr m „. Let Q £ Gr m n 
be arbitrary, and let P be sufficiently close to Q. Analogous to ( 30 1, we can then 
express P = e^^'^ Qe~^ p ' ,Q l for unique £ € Tq Gr m) „. Here, the tangent 
vector £ plays the role which [ z ] played in ( 39 1 . By a slight abuse of notation 
we consider the distance function between these arbitrary P and Q 

f : Gr m ,„ -> K, P t-> tr ££ H = tr £ 2 = dist 2 (P, Q) . (40) 



Note, that by the above described invariance, the analogue of Eq. (37) yields 
the critical point condition 

D/(P)=0^£ = 0. (41) 

Theorem 3. The Riemannian gradient, with respect to the Euclidean metric, 
of the function f , defined by GIw, is given by 



grad/(P) = 2ad K , Q] P. (42) 



Note that result (42) is in accordance with Proposition III 4.8 in [2"T] . 

Remark 1. One can derive further explicit formulas for the distance, however, 
they are less well suited for gradient computations or numerics. Let P,Q€ 
Gr mn . For any given O 6 U n such that P = 6 H I9 we define 



Qi Qi 



6Q9 H . (43) 



Let 1 > Ai > • • • > A TO > denote the eigenvalues of Qi G Herm m . The 



dist(P, Q) = y]2 YZi arccos 2 (VA7) . (44) 
Alternatively, let ■ •>/^n-m ^ denote the eigenvalues of Q3. Then 



dist(P, Q) = yj2 ^ = ; n arcsin 2 (^) . (45) 

In particular, if P,Q € Gr TOi „ with Q — YY H and Y H Y — I m , then 

idist 2 (P,Q) =tr(arccos 2 ((Y H PY) 5)) . (46) 

5.2. The Marcher mean 

We now consider a geodesically convex open balQS C Gr m n containing all, 
say N data points Qi. Note, that the Riemannian exponential map is bijective 



°I.e. all points in B can be connected by a unique shortest geodesic contained completely 
in B. E.g. for a sphere, the maximal geodesically convex open balls are open hemispheres. 
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on B and thus the results from the last section carry over to the subsequent 
analysis. Moreover, this assumption ensures that the Karcher mean is the unique 
minimizer of the function defined by ( |48| , [5 . This seems to be a sensible 
assumption in many applications, where different data might be considered to 
be different measurements of one and the same observable. Let us assume that 
PeB and thus, for each i there exists a unique £iGTQ < Gr min with 



P = exp Qj (&)=e Ki ' g4j Q 4 e 



Let the Karcher mean function now be defined as 

F:B^R, ^E^dist 2 ^, QO- 



Adaptiiig ( 28 ) and (|26| accordingly, we get 

F(P) = jf Eli tr^ 2 = -i Eli trQi Bdl 



Qi 



As a generalization of (41) we get the well known fact [SJ[H] that 
D F(P) = «=► Efci e [?1 ' Ql1 & e- Kl ' Ql] = 0. 



(47) 



(48) 



(49) 



(50) 



The interpretation of this condition is as follows. Let P be the unique critical 
point of F on B. Attaching a suitable coordinate chart around P tells us that in 
this chart P is equal to the usual Euclidean geometric mean of the data points 
Qi, expressed in exactly this chart Q The Riemannian gradient of the Karcher 
mean now follows immediately from Theorem [3] 



Theorem 4. The Riemannian gradient of F, defined by (48), is as 



gradF(P) = &££iad K , l(M P. 



(51) 



5.3. Inverse of the Riemannian exponential 

In the sequel we will present a procedure to explicitly compute the inverse 
of the Riemannian exponential. 

To that end, let Q € Gr mi „ and B C Tq Gr m n be a neighborhood of such 
that expQ : B — > expg(i?) is a bijection. Assume further that P E expg(B). 
Thus, there exists a unique £ 6 B such that 



P 



Qe 



-K,Q] 



(52) 



From the previous section, it follows that tr £ 2 = dist 2 (P, Q) . A partial task in 
our optimization procedure will therefore be to compute £ as a function of P 
for a given Q in (52). Let 9 e U„ such that Q = 9I6 H . It follows that 



e H P@ = e^-X-e-^ =:P 



Pll Pl2 
Pl2 ?22 



(53) 



7 Such a chart is called Riemannian normal coordinate chart in the literature. 
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with £ of the form 



-z n 

Z 



(54) 



Using a singular value decomposition (SVD) as Z H = ITE T V H we arrive at the 
representation 



u" o 

V H 



-£' 



-£' 



P[U " ]= c Ls o J -Z-e Ls o 




■I- 



co S v / ETs sTBinVSgF 



sin VSS T 



cos 2 VS T S 
cos VES T sin VES T , 



VSE T 

t cos VSSTsin VSS T 
VSS T 

sin 2 VSS T 



£ cos VSS t 



(55) 



cos 2 VE t E 
sin2v / SS T 



^ T sin2v / SS T 



2V££ T 



2\/££ T 
S sin 2 VES T 



The above matrix valued trigonometric functions have to be interpreted via 
their series expansion. To be more precise, the matrix E is a rectangular diago- 
nal matrix, and therefore EE T , E T E, and their corresponding square roots are 
diagonal (but square) as well. For i < m the ii— th entry of cos 2 VE T E equals 
the squared cosine of the i— th singular value of E, i.e. it equals cos 2 <7i, and 



equals 1 otherwise. The right-hand side of ( 55 ) can be efficiently computed via 
a CS-decomposition 28 of Y, where P = YY H with Y Y = I m , or equivalently 
by an SVD of P 12 . Using inverse trigonometric functions, £ can now be con- 
structed from (54). Ultimately, we have explicitly constructed the inverse of 



the Riemannian exponential map on Gr r , 



6. A conjugate gradient algorithm for computing the Karcher mean 
on the Grassmannian 

In the last sections, all ingredients have been derived for a geometric CG 
algorithm as described in Section [3j Here, we focus on its implementation 
and provide an explicit pseudo code for computing the Karcher mean of a set 
of complex subspaces. Note that a projector P can be uniquely expressed as 
P = XX H , where X is an element of the complex Stiefel manifold 

St m ,„ := {X G C nxm \X H X = I m }. (56) 

In order to compute the gradient of the Karcher mean function, the results in 
Section [5] require that after computing the tangent directions & € 2Q 4 Gr mn in 
|47|, one needs to parallel transport them back to Tp Gr mj „, i.e. e^'^l & e - ^*'*^ 



fp Gr m .„. The gradient of the Karcher mean function is simply the sum of all 



the parallel transported vectors in Tp Gr m! „ according to (51) 
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Algorithm 2: Riemannian gradient of the Karcher mean on Gr m „ 

Input : Yi £ St m ,„ for i = l, . . . ,N and [Xi X 2 ] € U„ with X x £ St. 
Step 1 : for i = 1, . . . , N do; 

Compute the SVD of X\ , Y i Y i H X 2 = [7jEjFj H ; 

Compute At := X^YiY^ X x Ui 

N 

Step 2 : Compute Z = Y, [ ] with 

-Zf = Ui [arccos VA7 0] V* € C mx ("- m ); 
Output: The Riemannian gradient 

gradF^X") = -[X, X 2 ]Z[X 1 X 2 ]" ; 



Now let N complex subspaces {Qi} C Gr mi „ be given, and let Yn £ St mjn , 
for i — 1, . . . , N, be the respective set of N unitary bases. For a given ini- 
tialization P £ Gr mj „ with its representation X € St m>n , we summarize a CG 
algorithm for computing the Karcher mean of the Qi 's in Algorithm [3] 

Algorithm 3: A CG for computing the Karcher mean on Gr m „ 
Input : Stiefel matrices Yi £ St mi „ for i = 1, . . . , N; 
Step 1 : Generate an initial guess [x[^ X% ] £ U„ and set i = 1; 
Step 2 : Compute = - gradF(X^X^ H ) using Algorithm |] ; 
Step 3 : Set i = i + 1; 

Step 4 : Update [x[ t+1) Af <- e"^^*^] [xf where a is 

computed via backtracking line search as in Algorithm ^ 
Step 5 : Update H {l+1 ^ <- -G( i+1 ) + r G Hi (a), where 

G(' i+1 ) =grad J F(A( I+1 )A( l+1 ) H ), 



and r is chosen according to Eq. (Ill; 
Step 6 : If i mod (2m(n - m) - 1) = 0, set <- -G*( i+1 ); 

Step 7 : If is small enough, stop. Otherwise, go to Step 3; 

As mentioned in Section [3j instead of employing a backtracking line search 
for selecting an optimal step size at each conjugate direction in Step 4 in Al- 
gorithm 3, one can use a one dimensional Newton step instead. Applying this 
approach to a general Grassmannian requires the calculation of the first and 
second derivatives of eigenvalues and eigenvectors of a Hermitian matrix valued 



function Y^P(t)Yi, cf. (44). Unfortunately, this approach is not well defined 
when the corresponding eigenvalues are multiple. 

In the rest of this section, we derive a Newton step size selection as in for 
the special case where m = 1, i.e. the complex projective space CP" -1 = Gri.„. 
Let Qi = yiy^ £ CP™ -1 , i = 1, . . . , N be a given set of data points with yi £ C" 
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statisfying = 1. We define 

\(t):=y?P(t) yi , (57) 



where P(t) is defined by (21 ), and by abuse of notation we set Xi — A, (Q). The 
first and second derivatives of the Karcher mean F can be computed as 

^(FoP)(E)| e=0 = tr(grad J F(P(0))P(0)), (58) 

and 

£i(FoP)(e)\ £ =o = ^2EIUarccos 2 



£ = 

= 2^ Eti TOOS VW A ^ (e)i L, (59) 



ov-m I A 2(A i -A 2 )A i -A 2 (l-2A i ) n~ 

2 Ei=i ( a^a? 2 (Va,-aV arccos V* I ^ 



respectively. Finally, a one dimensional Newton step can be computed according 
to Eq. 0. 



7. An application in blind identification 

We outline how the above described computation of the Karcher mean can 
be applied to the problem of Blind Identification (BI). A simple instantaneous 
BI model assumes that the observation is a linear combination of some unknown 
sources, i.e. 

w(t) = As(t), (60) 

where A £ c nxn j s the full rank system matrix and w(t) presents n observed 
linear mixtures of n sources s(t). Blind identification aims to estimate the 
system matrix A and various algorithms have been developed for this task, cf. 
[2"9l [50] , Recall, that the system matrix A can be estimated only up to an 
arbitrary complex scaling and permutation of the columns, cf. |30j . Thus it is 
reasonable to consider estimates of columns of A as elements in the complex 
projective space CP" -1 := Gri jn . We refer to [H| for an approach on how to 
include the full rank constraint of A into this setting. 

It is known that performance of BI methods are sensitive to the distribution 
of noise, cf. [33 [35]. In other words, different distributions for the noise might 
lead to different optimal estimation of the system matrix. In particular, when 
system noise is present that varies over time, the matrix A can no longer be 
guaranteed to be estimated correctly by a single process. To overcome this 
difficulty, an intuitive idea is to simply average over sub-optimal estimations of 
the system. A similar approach has been investigated in [33) . where a Karcher 
mean based method is proposed for solving the real and whitened ICA problem. 
Unfortunately, its applications are limited to the cases with stationary signals 
and noise. 
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Our experiments employ the following noisy model 

w i (t) = (A + e i Z i )s i (t), i = l,...,N, (61) 

where A is the ground truth system matrix, Zi € C nxn models system noise, 
£i > represents the noise level, and Si(t) denotes the unknown signals. Both, 
real and imaginary parts of all entries of the ground truth system matrix A are 
drawn from a normal distribution. Perturbations Zi are applied to the system 
matrix, where the real and imaginary part of each entry of Zi are drawn from 
a uniform distribution on the interval [—0.5,0.5]. 

A popular BI algorithm is the so-called Strong Uncorrelated Transform 
(SUT), cf. j34j. It uses the assumption that the source signals are uncorre- 
lated and non-circular with distinct circularity coefficients. A joint diagonalizer 
of both the covariance and the pseudo-covariance matrix of the observations 
serves as an estimation of the inverse of the system matrix A. We employ the 
SUT for each of the N subproblems to get N estimates of A. After apply- 
ing a suitable preprocess, we may assume that corresponding columns of the 
estimates are aligned, cf. |35j for more details, and thus we neglect the permu- 
tation ambiguity in our experiments. The overall estimate is given by column 
wise computing the Karcher mean of the solutions of the sub-problems. Iden- 
tification performance of the proposed method is measured by the normalized 
Amari error, cf. [36], defined as 

/ m E \b<j\ m E \bij\ \ 

AAA) := i £ + £ ^twiA ~ 2 ' ( 62 ) 

\ 1 = 1 j j = l i J 

where A is an estimation of A, and B = (&ij)™ =1 = A~ x A. In general, the 
smaller the Amari error, the better the identification of the system matrix. 

In our experiments, we compare the proposed Karcher mean method to a 
simple standard approach, referred to as the Euclidean mean approach. Thereby, 
all solutions produced by SUT are summed up and the columns of the obtained 
matrix are normalized to have unit norm. First of all, we investigate the per- 
formance of both methods against the number of estimations N. We fix n = 5 
and run N — 100 experiments per number of estimations. As the box plots 
of Amari errors in Figure [3] and Figure [4] suggest, both Karcher and Euclidean 
subspace averaging methods admit a consistently increasing performance with 
an increasing number of estimations. In our second experiment, we choose a 
various number of noise levels q € {1,0.5,0.2,0.1,0.01} and fix the number of 
estimations N — 10. As shown in Figure [5j the Karcher mean approach outper- 
forms the Euclidean counterpart consistently and its advantage is considerably 
higher, the more noise is present. 

8. Conclusion 

This work focuses on the problem of averaging complex subspaces of equal di- 
mension by computing their Karcher mean, which is under mild assumptions the 
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Figure 3: Performance of subspace averaging via Karcher mean. 
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Figure 4: Performance of subspace averaging via Euclidean mean. 

unique minimum of a well denned smooth function on the complex Grassman- 
nian. An accessible introduction to the geometric structure of the Grassmanian 
is provided by its identification with the set of Hermitian projectors. In par- 
ticular, explicit formulas for geodesies, parallel transport, and the Riemannian 
gradient of the Karcher mean function are given, which, in contrast to other 
formulas available in the literature, are well suited for implementation. 
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Figure 5: Comparison of performance (left = Euclidean, right = Karcher). 



These results are used to propose an intrinsic conjugate gradient algorithm 
on the Grassmannian for computing the Karcher mean. We present experiments 
and outline the usability of such an approach in blind identification. 
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